Dataset statistics
| Number of variables | 18 |
|---|---|
| Number of observations | 6703 |
| Missing cells | 11207 |
| Missing cells (%) | 9.3% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 942.7 KiB |
| Average record size in memory | 144.0 B |
Variable types
| Categorical | 8 |
|---|---|
| Numeric | 10 |
mifid_money_other_brokers is highly correlated with mifid_invested_other_brokers | High correlation |
mifid_invested_other_brokers is highly correlated with mifid_money_other_brokers | High correlation |
finish_mifid_days has 2890 (43.1%) missing values | Missing |
first_deposit_days has 4719 (70.4%) missing values | Missing |
first_trade_investor_account_demo_days has 3595 (53.6%) missing values | Missing |
start_mifid_days has 4831 (72.1%) zeros | Zeros |
finish_mifid_days has 800 (11.9%) zeros | Zeros |
first_deposit_days has 89 (1.3%) zeros | Zeros |
first_deposit_amount has 4719 (70.4%) zeros | Zeros |
first_deposit_platform has 728 (10.9%) zeros | Zeros |
mifid_actual_savings has 649 (9.7%) zeros | Zeros |
mifid_next_year_savings has 649 (9.7%) zeros | Zeros |
mifid_invested_other_brokers has 3587 (53.5%) zeros | Zeros |
first_trade_investor_account_demo_days has 1804 (26.9%) zeros | Zeros |
Reproduction
| Analysis started | 2021-05-31 14:10:19.713938 |
|---|---|
| Analysis finished | 2021-05-31 14:10:52.623421 |
| Duration | 32.91 seconds |
| Software version | pandas-profiling v2.13.0 |
| Download configuration | config.yaml |
user_currency
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.5 KiB |
| USD | |
|---|---|
| EUR | |
| GBP | 317 |
| NO_CURRENCY | 2 |
Length
| Max length | 11 |
|---|---|
| Median length | 3 |
| Mean length | 3.002386991 |
| Min length | 3 |
Characters and Unicode
| Total characters | 20125 |
|---|---|
| Distinct characters | 13 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | EUR |
|---|---|
| 2nd row | USD |
| 3rd row | EUR |
| 4th row | EUR |
| 5th row | EUR |
| Value | Count | Frequency (%) |
| USD | 3270 | |
| EUR | 3114 | |
| GBP | 317 | 4.7% |
| NO_CURRENCY | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| usd | 3270 | |
| eur | 3114 | |
| gbp | 317 | 4.7% |
| no_currency | 2 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| U | 6386 | |
| S | 3270 | |
| D | 3270 | |
| R | 3118 | |
| E | 3116 | |
| G | 317 | 1.6% |
| B | 317 | 1.6% |
| P | 317 | 1.6% |
| N | 4 | < 0.1% |
| C | 4 | < 0.1% |
| Other values (3) | 6 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 20123 | |
| Connector Punctuation | 2 | < 0.1% |
Most frequent character per category
| Value | Count | Frequency (%) |
| U | 6386 | |
| S | 3270 | |
| D | 3270 | |
| R | 3118 | |
| E | 3116 | |
| G | 317 | 1.6% |
| B | 317 | 1.6% |
| P | 317 | 1.6% |
| N | 4 | < 0.1% |
| C | 4 | < 0.1% |
| Other values (2) | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| _ | 2 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 20123 | |
| Common | 2 | < 0.1% |
Most frequent character per script
| Value | Count | Frequency (%) |
| U | 6386 | |
| S | 3270 | |
| D | 3270 | |
| R | 3118 | |
| E | 3116 | |
| G | 317 | 1.6% |
| B | 317 | 1.6% |
| P | 317 | 1.6% |
| N | 4 | < 0.1% |
| C | 4 | < 0.1% |
| Other values (2) | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| _ | 2 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 20125 |
Most frequent character per block
| Value | Count | Frequency (%) |
| U | 6386 | |
| S | 3270 | |
| D | 3270 | |
| R | 3118 | |
| E | 3116 | |
| G | 317 | 1.6% |
| B | 317 | 1.6% |
| P | 317 | 1.6% |
| N | 4 | < 0.1% |
| C | 4 | < 0.1% |
| Other values (3) | 6 | < 0.1% |
user_country
Real number (ℝ≥0)
| Distinct | 122 |
|---|---|
| Distinct (%) | 1.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 47.52290019 |
| Minimum | 0 |
|---|---|
| Maximum | 121 |
| Zeros | 11 |
| Zeros (%) | 0.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 52.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 8 |
| Q1 | 31 |
| median | 36 |
| Q3 | 66.5 |
| 95-th percentile | 114 |
| Maximum | 121 |
| Range | 121 |
| Interquartile range (IQR) | 35.5 |
Descriptive statistics
| Standard deviation | 29.97011402 |
|---|---|
| Coefficient of variation (CV) | 0.6306457286 |
| Kurtosis | 0.1365035255 |
| Mean | 47.52290019 |
| Median Absolute Deviation (MAD) | 11 |
| Skewness | 1.040899302 |
| Sum | 318546 |
| Variance | 898.2077342 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 36 | 2201 | |
| 41 | 368 | 5.5% |
| 77 | 354 | 5.3% |
| 25 | 329 | 4.9% |
| 6 | 270 | 4.0% |
| 38 | 219 | 3.3% |
| 85 | 209 | 3.1% |
| 120 | 201 | 3.0% |
| 22 | 196 | 2.9% |
| 24 | 183 | 2.7% |
| Other values (112) | 2173 |
| Value | Count | Frequency (%) |
| 0 | 11 | 0.2% |
| 1 | 35 | |
| 2 | 1 | < 0.1% |
| 3 | 1 | < 0.1% |
| 4 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 121 | 63 | 0.9% |
| 120 | 201 | |
| 119 | 2 | < 0.1% |
| 118 | 1 | < 0.1% |
| 117 | 39 | 0.6% |
| Distinct | 354 |
|---|---|
| Distinct (%) | 5.3% |
| Missing | 3 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 20.49328358 |
| Minimum | 0 |
|---|---|
| Maximum | 1090 |
| Zeros | 4831 |
| Zeros (%) | 72.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 52.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 116 |
| Maximum | 1090 |
| Range | 1090 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 83.03379577 |
|---|---|
| Coefficient of variation (CV) | 4.05175654 |
| Kurtosis | 47.55451125 |
| Mean | 20.49328358 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 6.274375994 |
| Sum | 137305 |
| Variance | 6894.61124 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 4831 | |
| 1 | 338 | 5.0% |
| 2 | 140 | 2.1% |
| 3 | 99 | 1.5% |
| 4 | 63 | 0.9% |
| 5 | 61 | 0.9% |
| 7 | 57 | 0.9% |
| 6 | 45 | 0.7% |
| 8 | 38 | 0.6% |
| 11 | 31 | 0.5% |
| Other values (344) | 997 | 14.9% |
| Value | Count | Frequency (%) |
| 0 | 4831 | |
| 1 | 338 | 5.0% |
| 2 | 140 | 2.1% |
| 3 | 99 | 1.5% |
| 4 | 63 | 0.9% |
| Value | Count | Frequency (%) |
| 1090 | 2 | |
| 1038 | 1 | |
| 967 | 1 | |
| 882 | 1 | |
| 856 | 1 |
has_finished_mifid
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.5 KiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 6703 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 1 |
| 4th row | 0 |
| 5th row | 0 |
| Value | Count | Frequency (%) |
| 1 | 3811 | |
| 0 | 2892 |
| Value | Count | Frequency (%) |
| 1 | 3811 | |
| 0 | 2892 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 3811 | |
| 0 | 2892 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 6703 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 1 | 3811 | |
| 0 | 2892 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 6703 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 1 | 3811 | |
| 0 | 2892 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 6703 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 1 | 3811 | |
| 0 | 2892 |
| Distinct | 350 |
|---|---|
| Distinct (%) | 9.2% |
| Missing | 2890 |
| Missing (%) | 43.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 36.05691057 |
| Minimum | 0 |
|---|---|
| Maximum | 1090 |
| Zeros | 800 |
| Zeros (%) | 11.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 52.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 2 |
| Q3 | 13 |
| 95-th percentile | 221 |
| Maximum | 1090 |
| Range | 1090 |
| Interquartile range (IQR) | 12 |
Descriptive statistics
| Standard deviation | 105.7544829 |
|---|---|
| Coefficient of variation (CV) | 2.932987915 |
| Kurtosis | 26.9007546 |
| Mean | 36.05691057 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 4.765347049 |
| Sum | 137485 |
| Variance | 11184.01066 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 834 | 12.4% |
| 0 | 800 | 11.9% |
| 2 | 398 | 5.9% |
| 3 | 233 | 3.5% |
| 4 | 135 | 2.0% |
| 5 | 101 | 1.5% |
| 6 | 75 | 1.1% |
| 7 | 65 | 1.0% |
| 8 | 62 | 0.9% |
| 10 | 52 | 0.8% |
| Other values (340) | 1058 | 15.8% |
| (Missing) | 2890 |
| Value | Count | Frequency (%) |
| 0 | 800 | |
| 1 | 834 | |
| 2 | 398 | |
| 3 | 233 | 3.5% |
| 4 | 135 | 2.0% |
| Value | Count | Frequency (%) |
| 1090 | 1 | |
| 1040 | 1 | |
| 968 | 1 | |
| 936 | 1 | |
| 882 | 1 |
has_deposit
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.5 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 6703 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
| Value | Count | Frequency (%) |
| 0 | 4719 | |
| 1 | 1984 |
| Value | Count | Frequency (%) |
| 0 | 4719 | |
| 1 | 1984 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 4719 | |
| 1 | 1984 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 6703 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 0 | 4719 | |
| 1 | 1984 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 6703 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 0 | 4719 | |
| 1 | 1984 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 6703 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 0 | 4719 | |
| 1 | 1984 |
| Distinct | 312 |
|---|---|
| Distinct (%) | 15.7% |
| Missing | 4719 |
| Missing (%) | 70.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 59.60685484 |
| Minimum | 0 |
|---|---|
| Maximum | 1050 |
| Zeros | 89 |
| Zeros (%) | 1.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 52.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 4 |
| median | 11 |
| Q3 | 47 |
| 95-th percentile | 303.85 |
| Maximum | 1050 |
| Range | 1050 |
| Interquartile range (IQR) | 43 |
Descriptive statistics
| Standard deviation | 126.7542907 |
|---|---|
| Coefficient of variation (CV) | 2.126505265 |
| Kurtosis | 17.9245195 |
| Mean | 59.60685484 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | 3.849901044 |
| Sum | 118260 |
| Variance | 16066.6502 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 148 | 2.2% |
| 2 | 139 | 2.1% |
| 3 | 102 | 1.5% |
| 5 | 98 | 1.5% |
| 4 | 94 | 1.4% |
| 0 | 89 | 1.3% |
| 6 | 80 | 1.2% |
| 8 | 68 | 1.0% |
| 7 | 65 | 1.0% |
| 10 | 42 | 0.6% |
| Other values (302) | 1059 | 15.8% |
| (Missing) | 4719 |
| Value | Count | Frequency (%) |
| 0 | 89 | |
| 1 | 148 | |
| 2 | 139 | |
| 3 | 102 | |
| 4 | 94 |
| Value | Count | Frequency (%) |
| 1050 | 1 | |
| 1042 | 1 | |
| 1006 | 1 | |
| 984 | 1 | |
| 956 | 1 |
| Distinct | 270 |
|---|---|
| Distinct (%) | 4.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.853190752 |
| Minimum | 0 |
|---|---|
| Maximum | 1000 |
| Zeros | 4719 |
| Zeros (%) | 70.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 52.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1.929161201 |
| 95-th percentile | 19.29161201 |
| Maximum | 1000 |
| Range | 1000 |
| Interquartile range (IQR) | 1.929161201 |
Descriptive statistics
| Standard deviation | 29.74785421 |
|---|---|
| Coefficient of variation (CV) | 6.129545638 |
| Kurtosis | 432.5593816 |
| Mean | 4.853190752 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 17.84337111 |
| Sum | 32530.93761 |
| Variance | 884.9348298 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 4719 | |
| 1.929161201 | 564 | 8.4% |
| 3.858322401 | 385 | 5.7% |
| 7.716644803 | 135 | 2.0% |
| 19.29161201 | 116 | 1.7% |
| 38.58322401 | 82 | 1.2% |
| 11.5749672 | 71 | 1.1% |
| 2.314993441 | 60 | 0.9% |
| 5.787483602 | 43 | 0.6% |
| 9.645806004 | 26 | 0.4% |
| Other values (260) | 502 | 7.5% |
| Value | Count | Frequency (%) |
| 0 | 4719 | |
| 0.03858322401 | 1 | < 0.1% |
| 0.1022455436 | 1 | < 0.1% |
| 0.1736245081 | 1 | < 0.1% |
| 0.1929161201 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1000 | 1 | < 0.1% |
| 771.6644803 | 2 | < 0.1% |
| 771.5487306 | 1 | < 0.1% |
| 462.8154179 | 1 | < 0.1% |
| 385.8322401 | 7 |
| Distinct | 7 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.557064001 |
| Minimum | 0 |
|---|---|
| Maximum | 6 |
| Zeros | 728 |
| Zeros (%) | 10.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 52.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 5 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.524490366 |
|---|---|
| Coefficient of variation (CV) | 0.9790800926 |
| Kurtosis | 1.272916836 |
| Mean | 1.557064001 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.638068951 |
| Sum | 10437 |
| Variance | 2.324070877 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 4719 | |
| 5 | 854 | 12.7% |
| 0 | 728 | 10.9% |
| 3 | 266 | 4.0% |
| 6 | 69 | 1.0% |
| 4 | 51 | 0.8% |
| 2 | 16 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 728 | 10.9% |
| 1 | 4719 | |
| 2 | 16 | 0.2% |
| 3 | 266 | 4.0% |
| 4 | 51 | 0.8% |
| Value | Count | Frequency (%) |
| 6 | 69 | 1.0% |
| 5 | 854 | |
| 4 | 51 | 0.8% |
| 3 | 266 | 4.0% |
| 2 | 16 | 0.2% |
| Distinct | 12 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8.750410264 |
| Minimum | 0 |
|---|---|
| Maximum | 15 |
| Zeros | 649 |
| Zeros (%) | 9.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 52.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 6 |
| median | 9 |
| Q3 | 12 |
| 95-th percentile | 13 |
| Maximum | 15 |
| Range | 15 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 4.04697566 |
|---|---|
| Coefficient of variation (CV) | 0.4624898191 |
| Kurtosis | -0.4249155079 |
| Mean | 8.750410264 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | -0.7458739295 |
| Sum | 58654 |
| Variance | 16.37801199 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 12 | 2071 | |
| 13 | 1021 | |
| 5 | 747 | 11.1% |
| 0 | 649 | 9.7% |
| 7 | 645 | 9.6% |
| 6 | 639 | 9.5% |
| 8 | 430 | 6.4% |
| 9 | 265 | 4.0% |
| 10 | 130 | 1.9% |
| 11 | 64 | 1.0% |
| Other values (2) | 42 | 0.6% |
| Value | Count | Frequency (%) |
| 0 | 649 | |
| 1 | 1 | < 0.1% |
| 5 | 747 | |
| 6 | 639 | |
| 7 | 645 |
| Value | Count | Frequency (%) |
| 15 | 41 | 0.6% |
| 13 | 1021 | |
| 12 | 2071 | |
| 11 | 64 | 1.0% |
| 10 | 130 | 1.9% |
| Distinct | 12 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8.355512457 |
| Minimum | 0 |
|---|---|
| Maximum | 15 |
| Zeros | 649 |
| Zeros (%) | 9.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 52.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 6 |
| median | 8 |
| Q3 | 12 |
| 95-th percentile | 13 |
| Maximum | 15 |
| Range | 15 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 4.081419453 |
|---|---|
| Coefficient of variation (CV) | 0.4884702733 |
| Kurtosis | -0.7078882074 |
| Mean | 8.355512457 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | -0.4779895924 |
| Sum | 56007 |
| Variance | 16.65798475 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 12 | 1412 | |
| 13 | 1264 | |
| 5 | 1002 | |
| 6 | 820 | |
| 7 | 714 | |
| 0 | 649 | |
| 8 | 404 | 6.0% |
| 9 | 216 | 3.2% |
| 10 | 109 | 1.6% |
| 11 | 61 | 0.9% |
| Other values (2) | 52 | 0.8% |
| Value | Count | Frequency (%) |
| 0 | 649 | |
| 1 | 1 | < 0.1% |
| 5 | 1002 | |
| 6 | 820 | |
| 7 | 714 |
| Value | Count | Frequency (%) |
| 15 | 51 | 0.8% |
| 13 | 1264 | |
| 12 | 1412 | |
| 11 | 61 | 0.9% |
| 10 | 109 | 1.6% |
mifid_qualifications
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.5 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 6703 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 1 |
| 4th row | 0 |
| 5th row | 1 |
| Value | Count | Frequency (%) |
| 0 | 3550 | |
| 1 | 3153 |
| Value | Count | Frequency (%) |
| 0 | 3550 | |
| 1 | 3153 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 3550 | |
| 1 | 3153 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 6703 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 0 | 3550 | |
| 1 | 3153 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 6703 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 0 | 3550 | |
| 1 | 3153 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 6703 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 0 | 3550 | |
| 1 | 3153 |
mifid_experience
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.5 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 6703 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 1 |
| 4th row | 0 |
| 5th row | 0 |
| Value | Count | Frequency (%) |
| 0 | 4589 | |
| 1 | 2114 |
| Value | Count | Frequency (%) |
| 0 | 4589 | |
| 1 | 2114 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 4589 | |
| 1 | 2114 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 6703 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 0 | 4589 | |
| 1 | 2114 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 6703 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 0 | 4589 | |
| 1 | 2114 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 6703 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 0 | 4589 | |
| 1 | 2114 |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.5 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 6703 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 0 |
| 5th row | 1 |
| Value | Count | Frequency (%) |
| 0 | 3587 | |
| 1 | 3116 |
| Value | Count | Frequency (%) |
| 0 | 3587 | |
| 1 | 3116 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 3587 | |
| 1 | 3116 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 6703 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 0 | 3587 | |
| 1 | 3116 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 6703 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 0 | 3587 | |
| 1 | 3116 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 6703 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 0 | 3587 | |
| 1 | 3116 |
| Distinct | 11 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.656422497 |
| Minimum | 0 |
|---|---|
| Maximum | 15 |
| Zeros | 3587 |
| Zeros (%) | 53.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 52.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 12 |
| 95-th percentile | 13 |
| Maximum | 15 |
| Range | 15 |
| Interquartile range (IQR) | 12 |
Descriptive statistics
| Standard deviation | 5.399632781 |
|---|---|
| Coefficient of variation (CV) | 1.159609718 |
| Kurtosis | -1.521028421 |
| Mean | 4.656422497 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 0.4913490494 |
| Sum | 31212 |
| Variance | 29.15603417 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 3587 | |
| 12 | 1358 | 20.3% |
| 13 | 515 | 7.7% |
| 5 | 395 | 5.9% |
| 6 | 307 | 4.6% |
| 7 | 237 | 3.5% |
| 8 | 151 | 2.3% |
| 9 | 91 | 1.4% |
| 10 | 28 | 0.4% |
| 11 | 18 | 0.3% |
| Value | Count | Frequency (%) |
| 0 | 3587 | |
| 5 | 395 | 5.9% |
| 6 | 307 | 4.6% |
| 7 | 237 | 3.5% |
| 8 | 151 | 2.3% |
| Value | Count | Frequency (%) |
| 15 | 16 | 0.2% |
| 13 | 515 | 7.7% |
| 12 | 1358 | |
| 11 | 18 | 0.3% |
| 10 | 28 | 0.4% |
user_flow_name
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.5 KiB |
| 3 | |
|---|---|
| 0 | |
| 2 | 202 |
| 1 | 29 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 6703 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 3 |
| 3rd row | 3 |
| 4th row | 3 |
| 5th row | 3 |
| Value | Count | Frequency (%) |
| 3 | 3455 | |
| 0 | 3017 | |
| 2 | 202 | 3.0% |
| 1 | 29 | 0.4% |
| Value | Count | Frequency (%) |
| 3 | 3455 | |
| 0 | 3017 | |
| 2 | 202 | 3.0% |
| 1 | 29 | 0.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 3455 | |
| 0 | 3017 | |
| 2 | 202 | 3.0% |
| 1 | 29 | 0.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 6703 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 3 | 3455 | |
| 0 | 3017 | |
| 2 | 202 | 3.0% |
| 1 | 29 | 0.4% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 6703 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 3 | 3455 | |
| 0 | 3017 | |
| 2 | 202 | 3.0% |
| 1 | 29 | 0.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 6703 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 3 | 3455 | |
| 0 | 3017 | |
| 2 | 202 | 3.0% |
| 1 | 29 | 0.4% |
| Distinct | 206 |
|---|---|
| Distinct (%) | 6.6% |
| Missing | 3595 |
| Missing (%) | 53.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 16.84137709 |
| Minimum | 0 |
|---|---|
| Maximum | 957 |
| Zeros | 1804 |
| Zeros (%) | 26.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 52.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 3 |
| 95-th percentile | 88.65 |
| Maximum | 957 |
| Range | 957 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 67.47409902 |
|---|---|
| Coefficient of variation (CV) | 4.006447849 |
| Kurtosis | 58.25967699 |
| Mean | 16.84137709 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 6.765513322 |
| Sum | 52343 |
| Variance | 4552.754039 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1804 | |
| 1 | 313 | 4.7% |
| 2 | 202 | 3.0% |
| 3 | 103 | 1.5% |
| 4 | 61 | 0.9% |
| 5 | 60 | 0.9% |
| 7 | 36 | 0.5% |
| 6 | 35 | 0.5% |
| 8 | 22 | 0.3% |
| 10 | 20 | 0.3% |
| Other values (196) | 452 | 6.7% |
| (Missing) | 3595 |
| Value | Count | Frequency (%) |
| 0 | 1804 | |
| 1 | 313 | 4.7% |
| 2 | 202 | 3.0% |
| 3 | 103 | 1.5% |
| 4 | 61 | 0.9% |
| Value | Count | Frequency (%) |
| 957 | 1 | |
| 898 | 1 | |
| 852 | 1 | |
| 737 | 1 | |
| 691 | 1 |
conversion
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.5 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 6703 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
| Value | Count | Frequency (%) |
| 0 | 5076 | |
| 1 | 1627 | 24.3% |
| Value | Count | Frequency (%) |
| 0 | 5076 | |
| 1 | 1627 | 24.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 5076 | |
| 1 | 1627 | 24.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 6703 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 0 | 5076 | |
| 1 | 1627 | 24.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 6703 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 0 | 5076 | |
| 1 | 1627 | 24.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 6703 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 0 | 5076 | |
| 1 | 1627 | 24.3% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| user_currency | user_country | start_mifid_days | has_finished_mifid | finish_mifid_days | has_deposit | first_deposit_days | first_deposit_amount | first_deposit_platform | mifid_actual_savings | mifid_next_year_savings | mifid_qualifications | mifid_experience | mifid_money_other_brokers | mifid_invested_other_brokers | user_flow_name | first_trade_investor_account_demo_days | conversion | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | EUR | 36 | 0.0 | 0 | NaN | 0 | NaN | 0.0 | 1 | 8 | 8 | 0 | 0 | 1 | 8 | 0 | NaN | 0 |
| 1 | USD | 77 | 0.0 | 0 | NaN | 0 | NaN | 0.0 | 1 | 12 | 13 | 0 | 0 | 1 | 12 | 3 | NaN | 0 |
| 2 | EUR | 29 | 0.0 | 1 | 0.0 | 0 | NaN | 0.0 | 1 | 8 | 8 | 1 | 1 | 1 | 5 | 3 | NaN | 0 |
| 3 | EUR | 36 | 0.0 | 0 | NaN | 0 | NaN | 0.0 | 1 | 12 | 12 | 0 | 0 | 0 | 0 | 3 | NaN | 0 |
| 4 | EUR | 36 | 0.0 | 0 | NaN | 0 | NaN | 0.0 | 1 | 7 | 7 | 1 | 0 | 1 | 8 | 3 | NaN | 0 |
| 5 | EUR | 36 | 0.0 | 0 | NaN | 0 | NaN | 0.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | NaN | 0 |
| 6 | EUR | 36 | 0.0 | 1 | 3.0 | 0 | NaN | 0.0 | 1 | 12 | 12 | 1 | 0 | 1 | 12 | 2 | 0.0 | 0 |
| 7 | USD | 25 | 0.0 | 1 | 1.0 | 0 | NaN | 0.0 | 1 | 12 | 13 | 0 | 0 | 1 | 12 | 3 | 1.0 | 0 |
| 8 | EUR | 24 | 0.0 | 1 | 0.0 | 0 | NaN | 0.0 | 1 | 8 | 6 | 0 | 0 | 0 | 0 | 3 | NaN | 0 |
| 9 | USD | 26 | 0.0 | 1 | 3.0 | 0 | NaN | 0.0 | 1 | 12 | 13 | 1 | 0 | 0 | 0 | 3 | NaN | 0 |
Last rows
| user_currency | user_country | start_mifid_days | has_finished_mifid | finish_mifid_days | has_deposit | first_deposit_days | first_deposit_amount | first_deposit_platform | mifid_actual_savings | mifid_next_year_savings | mifid_qualifications | mifid_experience | mifid_money_other_brokers | mifid_invested_other_brokers | user_flow_name | first_trade_investor_account_demo_days | conversion | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6693 | EUR | 36 | 0.0 | 1 | 1.0 | 1 | 3.0 | 11.574967 | 5 | 6 | 13 | 1 | 1 | 0 | 0 | 0 | 26.0 | 1 |
| 6694 | EUR | 58 | 0.0 | 1 | 0.0 | 1 | 199.0 | 3.858322 | 5 | 13 | 13 | 1 | 0 | 0 | 0 | 0 | NaN | 1 |
| 6695 | USD | 91 | 811.0 | 1 | 829.0 | 1 | 1006.0 | 1.929161 | 0 | 13 | 5 | 1 | 0 | 1 | 12 | 0 | NaN | 0 |
| 6696 | EUR | 58 | 0.0 | 1 | 0.0 | 1 | 2.0 | 1.929161 | 5 | 12 | 12 | 1 | 0 | 0 | 0 | 0 | 0.0 | 1 |
| 6697 | EUR | 36 | 0.0 | 0 | NaN | 0 | NaN | 0.000000 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NaN | 0 |
| 6698 | EUR | 36 | 246.0 | 1 | 246.0 | 1 | 248.0 | 4.629987 | 5 | 12 | 12 | 0 | 0 | 0 | 0 | 0 | 0.0 | 1 |
| 6699 | EUR | 38 | 0.0 | 0 | NaN | 0 | NaN | 0.000000 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | 0 |
| 6700 | EUR | 58 | 0.0 | 1 | 0.0 | 1 | 63.0 | 3.858322 | 5 | 5 | 5 | 1 | 1 | 0 | 0 | 0 | 0.0 | 1 |
| 6701 | USD | 20 | 167.0 | 1 | 169.0 | 1 | 196.0 | 1.929161 | 0 | 12 | 12 | 0 | 0 | 1 | 12 | 0 | 0.0 | 0 |
| 6702 | USD | 84 | 0.0 | 0 | NaN | 0 | NaN | 0.000000 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NaN | 0 |